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Abstract: We show that the problem of finding a set with maximum cohesion 
in an undirected network is NP-hard. 
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Maximiser la Cohesion est NP-dur 

Resume : Nous montrons que le probleme de trouver un ensemble de cohesion 
maximum dans un graphe non oriente est NP-dur. 
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Introduction 

In [T], we have introduced a new metric called the cohesion which rates the 
community-ness of a group of people in a social network from a sociological 
point of view. Through a large scale experiment on Facebook, we have estab- 
lished that the cohesion is highly correlated to the subjective user perception 
of the communities. In this article, we show that finding a set of vertices with 
maximum cohesion is NP-hard. 



Notations 

Let G = (y, E) be a graph with vertex set V and edge set E of size n = |y | > 4. 
For all vertices u €V , we write dciu) the degree of u, or more simply A 
triangle in G is a triplet of pairwise connected vertices. 

For all sets of vertices S C V, let G[S] = {S, Es) be the subgraph induced 
by S on G. We write m{S) = \Es\ the number of edges in G[S], and i{S) = 
|{(u, f , w) € : {uv, vw, uw) G E^}\ the number of triangles in G\S\. We define 
o{S) = \{{u, V, w), {u,v) ^ S'^,w ^ V \ S : {uv, vw, uw) G E^}\, the number of 
outbound triangles of S, that is: triangles in G which have exactly two vertices 
in S. 

Moreover, for all {u,v) in E, let A{uv) = \{w G V : {uw,vw) G E^}\ be the 
number of triangles the edge uv belongs to in G. 

Finally, we recall the definition of the cohesion of a set 5 in G: 



C{S) 



){^{S)+o{S)) 



An example is given on Figure [TJ The cohesion of a given set in G can 
naively be computed in 0{n^) by listing all triangles in G and counting those 
inside and outbound to S. 




Figure 1: In this example, i{S) = 2 and 0(5*) = 1, thus C{S) = ^ 

In this article we examine the problem of finding a set of vertices C y of 
maximum cohesion, i.e. for all subset S' C V, C(S') < C{S). 

Outline 

We now proceed to prove that finding a set of vertices with maximum cohesion 
in G is NP-hard. We will first show in Section [T] that this problem is equivalent 



^Here, as elsewhere, we drop the index referring to the underlying graph if the reference is 
clear. 
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to that of finding a connected set of vertices with maximum cohesion in G. The 
decision problem associated to the latter is Connected-Cohesive. 

Then, we shall prove that Connected-Cohesive is NP-complete by reduc- 
ing Clique (problem GT19 in 2 ). From there we deduce that the optimization 
problem of finding a set of vertices with maximum cohesion is NP-hard. 

Problems 

1. Connected-Cohesive: 

Input A graph G ^ {V, E), X e X e [0, 1] 

Question Is there a subset connected S* of such that C{S) > A? 

2. Clique: 

Input A graph G = (F,^;), fc G N,fc < |V^| 

Question Is there a subset S oi V such that \S\ = k and the subgraph 
induced by S* is a clique? 



1 A maximum cohesive group is connected 

In order to prove that a set of vertices with maximum cohesion in a given 
network is connected, we need the following lemma: 

Lemma 1.1. Let 5i C 1/ and S2 V be two disconnected sets of vertices 
{{Si X S2)nE^9). IfC{Si) <C{SiUS2) then €{82) >C{SiUS2). 

Proof. Suppose C(S'i) < C{Si U S2) and €{82) < C{Si U 5*2). Given that Si and 
^2 are disconnected, i{Si U S2) = i{Si) + i{S2) and 0(5*1 U 5*2) = o{Si) + o{S2). 
We can then write: 

'j^<{^{Sl)+o{Sl))C{SlUS2) (1) 
\ 3 ) 

'-^<{^{S2)+o{S2))C{SlUS2) (2) 
\ 3 I 

By summing (1) and (2), we obtain: 

^ + ^ < {^{Sl) + o{Si) + ^{S2) + o{S2))C{Si U ^2) 
\ 3 ) \ 3 ) 

< {i{Si U ^2) + o{Si U S2))C{Si u 52) 
{^{Sl)+^{S2)r 



< 



^|Sl| + |S2|^ 



Furthermore, given that l^il, |S'2| > 1, 



Si\\ . f\S2\\ ^ /|^l| + |^2| 



3 / V 3 / V 3 
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We then have: 

Ct') Ct') ('^3^') + Ct') 

Which simphfies to: 

(('f}:(..,-(l^3'')-<*')^<° 

Hence the contradiction. Therefore, for all 81,82 Q V, disconnected: 

C{8i) < C{8i U 82) ^ 0(82) > C(5i U 52) □ 

Theorem 1.2. Let 8 be the set of vertices of G with the highest cohesion, 8 is 
connected. 

Proof. Suppose 8 is not connected, then their exist two disconnect subsets 
81,82 Q 8 such that 8 = 81 U 82- Given that 8 has maximum cohesion, 
we have C{8) > C{8i). Thus per Lemma O C{8) < €{82) and 8 does not 
have the highest cohesion, hence the contradiction. □ 

Corollary 1.3. Per Theorem the problem of searching for a set of ver- 
tices with maximum cohesion is strictly equivalent to that of searching a set of 
connected vertices with maximum cohesion. 



2 Connected-Cohesive is NP-complete 

First note that given a set 5* of vertices of G, it is possible to verify that 8 
is a solution of Connected-Cohesive by computing its cohesion, its size, its 
connectivity and the minimum degree of its vertices, all in polynomial time. 
Therefore Connected-Cohesive is in NP. 



Algorithm 1 Transforms an instance of Clique in an instance of Connected- 

Cohesive 

Require: G = (V,E),k efi 

1: := 

2: E' := E 

3: for uv eV^\E do 

4: let if be a clique of size 2(g) 

5: W^WUK 

6: E' ^ E' U {uv}U {{u,v} X K) 
7; end for 

8: return G' = {V U W, E'), X = ,..11 



Let us now reduce Clique to Connected-Cohesive. Let (G = {V,E),k e 
N) be an instance of Clique^. We can assume that G is connected (if not, we 

^We consider here that |G| > 2 and k > 2, although this is not exactly Clique, this 
problem is clearly NP-complete, given that the complexity of Clique does not arise from 
those small values. 
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Figure 2: Illustration of Algorithni[T] At this step, wejoin u and add a clique 
of size 2(g) to the network, and join u and v to all vertices in the added clique. 



use the following reasoning separately on each connected component of G) . We 
construct an instance (G" = {V',E'),X) of Connected-Cohesive by adding 
an edge between all non connected vertices u and d in G and then linking those 
two vertices to all vertices in a clique of size 2 (g) which we add to the network, 
as described in Algorithm [T] and illustrated by Figure O 

Theorem 2.1. There exist a clique of size k in G iS there exist a connected 

(a) 



group of vertices of G' with cohesion A > 



Proof. Let K C_ V, he a clique of size \K\ — k in G. Given that no node or 
edges are deleted when constructing G', G is a subgraph of G' and thus if is a 
clique in G' and ic (K) = (3) ■ 

Moreover, by construction, G'[V] is a clique and for all u un K, the neighbors 
of u are also in V. Therefore, each edge in K forms one triangle with each vertex 
iTiV\K, which leads to og'{K) = (2) {n — k). Finally, this gives a cohesion: 



Cg-{K) 



V3 



Conversely, let 5 C V^' be a connected set of vertices such that Cg'{S) > 

-TfcT — j)~r- — — ■. We will show that 5 is a clique of size larger than k and that 

S CV. First note that \S\ > 3, because by definition, if \S\ < 3, Cg'(S') = 
which would lead to a contradiction. 

First, suppose that S is not a clique in G, then let us distinguish two cases: 

1. li S C V and S is not a clique, then S contains two vertices u,v ^ V"^ 
such that uv ^ E. 

2. If S* ^ V, then Elu G and S being connected, there exist v such 
that uv ^ E. 
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Therefore, if S is not a clique in G, it contains an edge uv ^ E and by construc- 
tion, this edge befongs to at least 2(g) triangles, which leads to: 

tG'{S) + OG'{S)>K 

^G'[Sf 



Cg-{S) < 



< 



1 



,4 



(3) 



Hence the contradiction, therefore 5" must be a clique in G. From there it comes 
that: 

CG'iS)^-J^ 
\3 

where k' — \S\. Therefore: 



Q + Qin-k) 
e c 

(3') 



r (3) , Ai)(--k') ^ C^Hn-k) 

(3) + (2)("-^) (3) (3) 

n — k' n ~ k 
^ T-. r < 



fc' - 3 - k-i 
^k' >k 

Therefore, we can now conclude that if there exist a connected set S in G" with 
cohesion Cg'(S) > ,k\ AV. — rr, then 5 is a clique of size at least k in G, and 

(3) + (2)(n-fc) 

thus there exist a clique K <Z S oi size fc in G. □ 
Theorem 2.2. Connected-Cohesive is NP-complete. 

Proof. Per Theorem 12. 1[ there exist a clique of size k in G iff there exist a 
connected subset of vertices of G' of cohesion A > 71^ — rir, and the trans- 

(,3j + (,2)("-'=) 

formation from G, fc to G', A runs in polynomial time. Thus Clique is reducible 
to Connected-Cohesive and Connected-Cohesive is NP-hard. 

Given that Connected-Cohesive is in NP, the problem is thus NP- 
complete. □ 



3 Conclusion 

The associated decision problem being NP-complete, the problem of finding a 
set of vertices with maximum cohesion is NP-har41. 

^Note that the problem of finding a set of vertices of maximum cohesion containing a set 
of predefined vertices is also NP-hard, by an immediate reduction 
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